Sufficient Dimensionality Reduction with Irrelevance Statistics

نویسندگان

  • Amir Globerson
  • Gal Chechik
  • Naftali Tishby
چکیده

The problem of unsupervised dimensionality reduction of stochastic variables while preserving their most relevant characteristics is fundamental for the analysis of complex data. Unfortunately, this problem is ill defined since natural datasets inherently contain alternative underlying structures. In this paper we address this problem by extending the recently introduced “Sufficient Dimensionality Reduction” feature extraction method [7], to use “side information” about irrelevant structures in the data. The use of such irrelevance information was recently successfully demonstrated in the context of clustering via the Information Bottleneck method [1]. Here we use this side-information framework to identify continuous features whose measurements are maximally informative for the main data set, but carry as little information as possible on the irrelevance data set. In statistical terms this can be understood as extracting statistics which are maximally sufficient for the main dataset, while simultaneously maximally ancillary for the irrelevance dataset. We formulate this problem as a tradeoff optimization problem and describe its analytic and algorithmic solutions. Our method is demonstrated on a synthetic example and on a real world application of face images, showing its superiority over other methods such as Oriented Principal Component Analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sufficient Dimensionality Reduction with Irrelevant Statistics

The problem of unsupervised dimensionality reduction of stochastic variables while pre­ serving their most relevant characteristics is fundamental for the analysis of complex data. Unfortunately, this problem is ill defined since natural datasets inherently contain al­ ternative underlying structures. In this paper we address this problem by extending the re­ cently introduced "Sufficient Dimen...

متن کامل

Local Kernel Dimension Reduction in Approximate Bayesian Computation

Approximate Bayesian Computation (ABC) is a popular sampling method in applications involving intractable likelihood functions. Without evaluating the likelihood function, ABC approximates the posterior distribution by the set of accepted samples which are simulated with parameters drown from the prior distribution, where acceptance is determined by distance between the summary statistics of th...

متن کامل

On sufficient dimension reduction for proportional censorship model with covariates

The requirement of constant censoring parameter β in Koziol–Green (KG) model is too restrictive. When covariates are present, the conditional KG model (Veraverbekea and Cadarso-Suárez, 2000) which allows β to be dependent on the covariates is more realistic. In this paper, using sufficient dimension reduction methods, we provide a model-free diagnostic tool to test if β is a function of the cov...

متن کامل

Canonical kernel dimension reduction

A new kernel dimension reduction (KDR) method based on the gradient space of canonical functions is proposed for sufficient dimension reduction (SDR). Similar to existing KDR methods, this new method achieves SDR for arbitrary distributions, but with more flexibility and improved computational efficiency. The choice of loss function in cross-validation is discussed, and a two-stage screening pr...

متن کامل

Sufficient Dimension Reduction Summaries

Observational studies assessing causal or non-causal relationships between an explanatory measure and an outcome can be complicated by hosts of confounding measures. Large numbers of confounders can lead to several biases in conventional regression based estimation. Inference is more easily conducted if we reduce the number of confounders to a more manageable number. We discuss use of sufficien...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003